Nbtadata kknagement for Large Statistical Databases

نویسنده

  • John L. McCarthy
چکیده

Data description or metadata presents a significant database management challenge, particularly for scientific and statistical databases. Ideally, we would llke to access and manipulate data and metadata using the same DBMS tools, but there are few systems that even begin to provide such integrated capabilities. This paper outlines a framework for more integrated metadata management by synthesizing ideas from statistical analysis, bibliographic retrieval, data dictionary, and database management systems. Drawing on experience and examples from a large statistical database project, the paper discusses and analyzes: d general types and uses of data about data * special types of metadata for statistical databases * metadata structure and characteristics * principles and requirements for metadata management

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Computing and Databases: Distributed Computing Near the Data

This paper addresses the following question: “how do we fit statistical models efficiently with very large data sets that reside in databases?” Nowadays it is quite common to we encounter a situation where a very large data set is stored in a database, yet the statistical analysis is performed with a separate piece of software such as R. Usually it does not make much sense and in some cases it ...

متن کامل

Improved emotion recognition with large set of statistical features

This paper presents and discusses the speaker dependent emotion recognition with large set of statistical features. The speaker dependent emotion recognition gains in present the best accuracy performance. Recognition was performed on English, Slovenian, Spanish, and French InterFace emotional speech databases. All databases include 9 speakers. The InterFace databases include neutral speaking s...

متن کامل

Scaling EM (Expectation-Maximization) Clustering to Large Databases

Practical statistical data clustering algorithms require multiple data scans to converge. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework requiring at most one scan of the database, and apply it to the Expectation-Maximization (EM) algorithm. Unlike distance-based or hard membership algorithms (such as k-Means) EM is known to be an app...

متن کامل

Conceptual Clustering of Heterogeneous Distributed Databases

With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...

متن کامل

Clustering of highly homologous sequences to reduce the size of large protein databases

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560,000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive datab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998